This utility will scan a site domain HTML page by HTML page for links to geospatial data looking at their file extensions. As each page is uploaded, the robot parses it for hypertext links which match a particular site domain. Hypertext links that reference HTML pages are placed in a 'to do' file which the robot uses to traverse the site. Those hypertext links which reference geospatial datasets as well as certain graphic file formats are placed in a 'links' file that can be later viewed by the user. So at the end of a run you have a text file of HTML links indicating where the robot has been and an HTML page indicating what the robot has picked up along the way. Geospatial data links are also entered into a database which can be queried much like other Web robot sites. The robot also takes the <title> </title> off of each traversed HTML page and uses that as a description for the links on that page. The following file formats are suported. The robot now does sites that use FRAMES.
The sites located on this page are known to work and can be used for demonstrating this software. The RDBMS on this version of the program is disabled, so you can use this utility strictly as a search tool on any site and as many times as you want.
On some sites the geospatial datasets are linked to HTML pages that are generated by CGI scripts. The HTTP utility that I use can not access these pages and therefore can not parse them.
Note that the robot traverses HTML pages by taking links off of pages that it has previously visited. If the initial page has no HTML links on it then obviously the robot will not work past that page. You can check this by using Ctrl-U on your browser and look at the <a href=" "> </a> fields. Its best to start the robot off on pages that have lots of links that correspond to the site domain you entered.
Go back to the Search Page.